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ABSTRACT 

A Statistical analysis of data on Michigan, New York 
City, New York state, and Project Talent schools found evidence of 
schools that consistently produce outstanding students even after 
allowance is made for the different initial endowments of their 
students and for chance variation. Methodologically, like many 
previous studies, this report uses regression analysis of achievement 
data, but focuses on statistical outliers rather than central 
tendencies. Three tools of analysis were used to examine the 
residuals: (1) Histograms of residuals, showing no immediate evidence 
of extreme overachievers. (2) Comparisons, over different grades and 
years, of the number of schools that consistently over-achieved with 
the number expected assuming all residual variation was random. 
Evidence of unusually effective schools was found. (3) Comparisons of 
background characteristics of the hypothesized over-achieving schqols 
with those of the average school. Outstanding Michigan schools tended 
to have smaller class sizes, more teachers earning over $11,000, and 
more teachers with greater than five year's experience. (Author) 
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ARE THERE UNUSUALLY EFFECTIVE SCHOOLS? 
Robert E. Klitgaard and George Hall 



I. INTRODUCTION 

Beginning with the Coleman report and continuing through the most 
recent research efforts,^ scholarly analysis has eroded the belief that 
different school policies can lead to increases in educational achieve- 
ment. Large-scale statistical studies have failed to show consistent 
and important relationships between what goes on In schools and varia- 

2 

tions in student learning, as measured by cognitive achievement tests. 
To most people concerned with measuring and improving school effective- 
ness, these are distressing results, perhaps the most counter-intuitive 
findings in public policy research in the past decade- 

A number of rather drastic alternatives are open. One is to accept 
the Coleman results and declare them the fault of the entire educational 
system. On this view educational effectiveness can only come about 
through radical reform of our whole way of schooling. 

Another alternative is to reject Coleman's findings on the grounds 
that the wrong things were m^^asured* One should stop reading the statis- 
ticians and economists and start reading Plato and Dewey on the true 
goals of education. 

The findings reported here are based on the authors' A Statistical 
Search for Unusually Effective Schools , R-1210-CC/RC (Santa^onica : The 
Rand Corporation, 1973). We are grateful to the Carnegie Corporation and 
Rand for research support; to Henry Acland and the University of the State 
of New York for data; and to Frank. Berger and Gus Haggstrom for their advice 
and assistance. The usual caveat protecting these people and institutions 
^ from further responsibility is, of course, in order. 
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Or there is despair. Perhaps one should leave the educational field 
and go into something like bartending, where the results are clear-cut, 
the recipients thankful, and the emoluments more gratifying. 

But there are also, promising middle courses that stay in the main- 
stream of educational research. Without rejecting the extreme alternatives 
entirely, to us the most promising course seems to be in the middle; but 
ironically it involves getting away from central tendencies. Previous 
studies have indicated that on average school policies do not have much 
effect on measurable student outcomes. Suppose this is true. Might 
there not remain, nevertheless, a group of unusually effective schools 
that are different? Are there any exceptions to small average tendencies 
and insignificant regression coefficients? The mathematics of previous 
studies allow for such a possibility, as long as the number of exceptions 
is not large. In short, are there unusually effective schools? 

At first glance the answer may seem obvious. Considering the enormous 
diversity among the nation's public schools, it would surely be incredible 
if some were not much better than others. B'ur thermore, parents and children, 
administrators and teachers, journalists and taxpayers seem to act as if 
some schools were unusally effective. An existence theorem seems hardly 
in need of proof, or even exploration. 

Clearly, schools do differ ^n their outcomes. Some schools consis- 
tently have higher achievement scores, lower drop-out rates, more college- 
bound graduates, wealthier alumni, and so forth. But these results cannot 
be entirely attributed to the schools themselves. Pupils bring different 
amounts of intellectual capital to their educational experiences, in the 
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form of different social, economic, and innate characteristics. Schools 

with more "advantaged" students will tend to achieve superior results. 

Furthermore, even when non-school background fac tors are identical among 

students in different schools, random variation will ensure that some 

schools will perform better than others. The question of unusually 

effective schools must therefore be carefully phrased: Do some schools 

consistently produce outstanding students even after allowance is made 

• for the different initial endowments of their students and for chance 
3 

variation ? 

Even if unusally effective schools were rare, they would be very 

important for educational policy. So long as some exist and can be 

identified, there is hope for replication of superior performance through- 
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out the educational system. Of course, even if exemplary schools exist, 
it is a separate question whether their success can be reproduced else- 
V7here.^ But if there are no unusually effective schools, we may have 
to consider seriously radically different alternatives from the present 
efforts of trying to discover and diffuse "best practice," We may need 
to make substantial changes in educational expenditures, or we may 
need to opt for some radical overhaul of the whole schooling system, 
cs Silberman, Illich , and others advocate . Thus , investigating the 
existence of unusually effective schools is not merely a matter of scien- 
tific curiosity, but is a necessary foundation for a rational public policy 
towards educational improvement. 

The scope of this study is limited in two ways. First, we have 
defined school outcomes in terms of student performance on standardized 



readin-g and iiiaLhtJinaLius ach iuveiiieiiL ccsus. The whole question of 
defining "educatioaal effectiveness" is somehow logically prior to the. 
search for unusually effective schools; yet we do not cJaim to l.ave 
"solved" that problem. (It may be no more soluble than the question 
"what sort of houS(.? is best?") Our reliance: on achievement data is 
not merely the rest* it of jj;reaLur availability, for we feel that such 
scores can reflect progress tov;ard some valid educational objectives. 
?Ait it goes w iJ^li.v'^M ! saving that test results can only be part of the 
story. Our paper is exploratory and conditional: if one takes achieve- 
ment scores as the measure of success, is there any evidence that sovik^ 
schools are exceptionally successful? 

The second limitation involves the questions we do not answer. 
There are a multitude of interesting and policy-relevant questions that 
can be asked aboLit unusually effective schools. But as Sherlock Holmes 
properly told Henry Baskerville, the prior question is, "Does the beast 
exist?" The null hypothesis asserts that there are no exemplary 
schools. If we can discover evidence that there are, we shall leave to 
further researchers the detailed and important tasks of discovering why 
s uch schoo Is exis t , and how (if at all) their success can be co pied . 
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II, PREVIOUS STUDIES 

Surprisingly little research has addressed the question of unusually 
effective schools. Scholarly analysis has concentrated on the average 
effects of all school policies on educational outcomes. After con- 
trolling for student background factors, the effects of different school 
policies have been found to be about the same on average. The anecdotal 
and case-study literature is replete with stories of educational suc- 
cesses, but the concentration is mostly on programs and not schools, is 
suspect of advocacy bias, and seldom includes any data.^ The question of 
unusual schools has generally gone unexamined, with a few exceptions. 

Part of Shaycoft's analysis of Project Talent retest data was aimed 

at finding out whether schools differed on their ninth-to-twelfth-grade 
g 

"growth rates." Not surprisingly, she found differences; but she did 

not control for socio-economic status (SES) or other background factors. 

The existence of outliers was not studied. Her study therefore did not 

establish that the different growth rates were due to school factors: 

perhaps the results were merely due to ri:indom variation and to differences 

9 

in non-school variables. 

In their seminal work on inequality and education, Jencks and his 
associates provided many important analyses of school impacts. "^^ Some 
of their findings have immediate relevance for the question of unusually 
effective schools--for instance, their studies of the vary narrow range 
of outcomes one observes among schools after controlling for various non- 
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school factors. "^"^ But they did not ripply the statistical tools required 
to determine the presence of exceptional performers. 

Jencks ejt aj^. rcgre ssed school achievement scores against student 
background factors. The difference between the school s observed 
average score and the one predicted by the regression equation was the 
measure of whether a school was an overachiover or an upderach ie ver . To 
see if there were consistent overachiover s , they correlated the resi- 
duals of all schools over time. The results were unanimous: the 
residuals never showed a high correlation. 

Correlation analysis, however, is a poor method for detecting out- 
liers. Variations that occur throughout the ent ire population of 
schools can drown out the consistency we are ' in te res ted in--that among 
the highest overach ie ve rs . The correlation coefficient is a measure of 
the strength of the l inear relationship between two random variables. 
The relationship among the residuals (or even among the highest ones) 
may not be linear, yet some schools may be persistent overachievers . 
Even if there is no consistent tendency for all overachievers to remain 
that way, some may. Thus, despite the thorough and path-breaking nature 
of most of their work, Jencks e^t a_l^ do not really come to grips with our 
ques t ion . 

An unpublished Office of Education study has come the closest to 

12 

addressing our problem. In 1968 Fetters, Connors, and Smith reana- 
lyzed the Coleman data and compared the over- and underachieving schools. 
Figure 1 reproduces a histogr^im of residuals from their regression of 
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achievement scores agfiinsL various back^'^roLind measures for 2392 scliools. 
r.erely plotting the residuals in this fashion consMtuLos an important 
step, as one now c^3n begin to loul-. fiu evidence about the tails of the 
distribution and not just its erintral tendency. (Notice how the right 
tail in Figure 1 stra^^^les: this may be a sign that there are soinu very 
exceptioncil performers.) Rut the authors went further. Tliey conipnxed 
the top 100 and bottom 100 schools, ranked by tlieir residuals, for iv-iuy 
input and situational characteristics. Tlie overachieving schools 
tended, for example, to have more parental interest, more and better 
instructional equipment, smaller classes, fewer culturally and economi- 
cally disadvantaged students (even after controlling for SES in the 
regression), less disciplinary difficulty, a better "general reputation" 
in the eyes of the schools' own principals, more white teachers, and a 
location away from industrial suburbs or the inner city. 

The OE stedy had two important impl ica t i o,iis . First, the variables 
that, educators had cilways supposed were important did distinguish 
between the overachieving and underachieving schools, despite the 
failure of these input variables to account for much variation over all 
the schools in the Coleman data. Second, the top 100 schools appa- 
rently were not on top just by chance. The fact that many school vari- 
ables were significantly different between the two sets of schools is 
powerful evidence that the position of the top 100 schools was not a 
mere statistical artifact. 
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III : METHODOLOGY 

Like many previous studies that used achievement scores as a proxi- 
mate measure of school results, our basic statistical tool is regression 
analysis. Unlike past studies, however, we are not looking for global 
relationships, so we care less about characteristics of all schools and 
more about features of some of them. Consequently, we adopt a different 
approach to the regressions. 

• Instead of concentrating on the properties of the regression 

2 

line ^ the percantage of variation explained (R ), and the coefficients 
of the regressor variables, we shall pay special attention to the res i - 
dua Is from the regression 1 : ne . 

• Instead of explicitly including school variables in 

the regression equation, we shall control only for non-school back- 
ground variables and implicitly assume that what is left over after 
such a fit represents school effectiveness (and random variation). 
School effectiveness in most past studies has been measured by the size 
and significance of the regression coefficients of the school variables, 

• Instead of including an abundance of regressor variables to ex- 
plain as much variation as possible, we shall try to avoid over- 
controlling. 

Three reasons dictate thes.e departures from previous practice. 
First, studies have shown that educational achievement is largely 
determined by non-school factors. This means that both school 
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effects and purely random fluctuation have been ra-ther small. This 

means tha-t'the practice of identifying 'school effectiveness with the 

residuals is not too dangerous. Residual variation could arise from 

a wide variety of causes besides school differences; imperfections of 

measuremen t , mis s pec i f ica t ion of the bac kground fac tors , omi tted 

variables, the choice of fitting technique, incomplete data, and the 

combined random fluctuations involved in all the regressor variables. 

2 

But previous studies, by dint of their high R s, imply that such 
errors are not likely to be large. This does not mean, as we shall see 
that we can attribute residual effects solely to schools^ but from past 
experience we take comfort in expecting systematic errors to be small. 

The second reason stems from possible intercor rela t ion betweeh 

school and background variables. If these variables suffer from multi- 
13 

CO llinear ity or somehow have a joint effect which cannot be attri- 

14 

buted to school or background alone, judging the true impact of 
s hools becomes well nigh impossible. One might reason that since we 
are looking for outstanding schools that are replicable, we ought to 
run two-stage least square regressions or specifically include an inter 
action term in the regression. That way, we would not call anything a 
"school effect" that was Inextricably bound up with the background 
factors of the school. But this argument is inappropriate here. We 
do not want to prejudge the replicability question. We do not want to 
eliminate school effects which are intercorre la ted with background 
effects. Furthermore, there is no convinci ng model of what variables 
should be included to captur e the entire s chool,_e_f f e c t . Thus , we shall 
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use ordinary least squares and be wary of controlling for too many back- 
ground factors, which might "drown out" the school effects. 

The third reason we adopt our approach to regression results stems 
from the implications of accepting our null hypothesis. If there are 
no unusua Ily ef fee tive schools , there are serious consequences for 
educational policy* The importance of affirming the null hypothesis 
means we want to be very sure that we do not accept it when it is false 
(we want to avoid a Type II error). If we control for a large number of 
background variables, there is an increased chance that through statis- 
tical interactions real outliers will not show up. Controlling for too 
few variables runs the risk of identifying "outliers" that could be ex- 
plained by some missing regressor. However, finding no outliers under 
such circumstances would be a very strong result indeed. The best 
strategy, given the nature of our problem, is to allow exceptional 
schools every chance to evidence themselves by calling the entire resi- 
dual the school's effect, even though this imparts an upward bias to 
the estimate, and by avoiding the risks of overcontrolling. 

One implication of our approach is that it will be very difficult 
to say that outliers are the result of unusually effective schools. 
They may merely be the product of chance perturbations or various kinds 
of statistical errors. But our task may be likened to that of a detec- 
tive, in contrast to the role of a judge. The detective searches for 
clues, the judge evaluates them. Our task is finding prima facie 
evidence that unusually effective schools exist, not proving their 
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existence beyond the shadow of a doubt. If wo. do pinpoint sovac likely 
candidates for exceptional schools, we must realize that only after 
they are studied in a detailed fashion can the verdict come in."^^ 

Basically J the t ask is to f i nd og 1 1 i e r s _o ii a c h i o y uni c i\t s c or e s^ t h a t 
^re not explained by non- school facto r s or random variatio n . U i. s t o g r am s 
of the residuals from a regression of school achievement scores on back- 
ground factors, as in Figure 1, provide a good starting point. Histo- 
grams allow easy visual inspection for "lumpiness" in the distribution 
of unusual tails, both of which have relevance to the question of unusu- 
ally effective schools. "Lumps" would show that groups of schools are 
massed together in a discontinuous fashion, which may be a clue that 
different educational "technologies" or procedures are being used in 
different schools. The right tail of the histogram is of keen interest. 
If it is very thick- it may imply that more schools than one would ex- 
pect (on the basis of a normal distribution) are performing far above 
a^'erage. A long tail, stretching out to four, five, and six standard 
deviations above the iriean , :■ s evidence that some schools are extremely 
hi; \\ 1 a c h i e V e r s . )^! c i t;: h o r " 1 p i n e s s " nor 1 1 n u s us 1 r i gh t l a i. i s wo u Id c o n - 
J : t i t Li 1 0 c one lu s i v e e \/ i d o \^ ce o f 3 n yth L ii ; hi\t t h e y v;o 1 1 . (} p r o vide i i U r - 
csting clues o£ where to concentrate our attention. 

Tl 1 e second t o o 1 i. nvc 1 \' e s 1 o o k i n a t serie s of d i s t: r : h ; i L' 1 o n u i 
residuals. Each i nd i \' id ua 1 d i s tr i (n 1 1 i o n ( s a y , f o r s c i i o o 1 s I n a ' - a r i i - 
V II la r yea r ) will s \\ow th e effects of r andovn va r i a L i on . A s er i o f 
distributions (over manv vcnrs) sho^;i the same sch<v:'ls v;i th .'-r. ^cr: 
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consistently some distance above the mean, provides fairly strong evidence 
that those schools are unusual and deserve a closer look. 

The null hypothesis says that all the variation in a particular dis- 
tribution of residuals is a result of chance and not school effectiveness. 
This implies that residuals will not be correlated from year to year (as 
Jencks ejt a_l confirmed). What we would like is some sort of "cumulative 
distribution^' of how well schools have done over many distributions, after 
controlling for background factors. Then we could see if that distribu- 
tion was significantly different from a theoretical distribution obtained 
by treating all the individual distributions of residuals as statistically 
independent . 

We used a proxy for this cumulative distribution. All schools in a 
given distribution (for a particular year, say) were assigned a one if 
they were more than one standard deviation above the mean and a zero 
otherwise. Then each school's totals were added up over all the years 
considered, and we tested whether some schools were consistently above 
one standard deviation more than chance would predict. 

To illustrate, assume a set of data for schools for the fourth grade 
during four successive years. The calculations of the proxy for the 
cumulative distribution are given in Figure 2, steps 1 and 2. Step 3 
computes the theoretical distribution, using the binomial theorem and, 
in this case, a (constant) probability that a school would be more than 
one standard deviation above the mean in any one distribution of 0,16. 
Step 4 compares the actual and expected 'distributions using the Chi- 

ERIC 
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square test for goodness of fit. In this hypothetical case, the null 
hypothesis could not be rejected at the 0.05 level. 

If some schools do appear to be outliers, it is important to see 
how they differ from the average school. Since in this paper we are 
only trying to discover if such schools exist and not why , the point of 
the comparison is not to uncover causal mechanisms, although we may 
find some clues. The goal is to separate random outliers from non- 
random ones. If many school-related characteristics of the top perfor- 
mers are different than the average school, it will provide strong con- 
firmation that we have indeed locat'ad something worthy of detailed 
study, and not merely a statistical quirk. On the other hand, if the 
only differences are in non-school factors, the outliers may be the 

result of an omitted variable or he teroscedas ticity in one of the 
16 

regressors . 
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IV. RESEARCH RESULTS 

Data from three separate sources were analvEcd. One was the 
1969-70 and 19"/0-71 Michigan State school filo, encompassing the fourth 
and seventh grades of approximately 90 percent of the state's public 
schools. A second involved New York City school data from 1967 to 
1971, grades 2 through 6. Finally, we looked at a set of 858 schools 
from the Project Talent high school data of 1960. 

The regression equations differed from data set to data set, and 
we experimented with a variety of fits within the Michigan data. The 
Michigan equations reported here employed regressor variables of SES 
(derived from a student questionnaire), percent minority enrollment in 
the school, and community type (five categories) « In the New York City 
data we controlled the school's mean reading score in grade k and year 
tn for its score in grade k-1 'and year m-1. Thus, for example, the 
fourth grade score for 1968 was regressed against the third grade score 
in 1967, providing a kind of measure of the students^ growth from one 
year to the nexv:. For Project Talent, we regressed ninth and eleventh 
grade composite achievement scores against an SES index. The regression 
results appear in Table 1. 

The first surprising result was how normal- looking the individual 
histograms of residuals looked for all three dc^ta sources. They were 
all unimodally massed around the zero mean, showed no consistent or 
large skewness, evidenced no discontinuities, and had very well-behaved 
tails. The only exception was one of the Michigan series (the 
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Table 1 
REGRESSION RESULTS 
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670-569 


Y 




0 


72 


^ 1 


USX 


■J 


.85 


h 


Pi 




->8 


0 


50 


444 



NOTE: 3tjB-267 rett-rs to ihe rL>Ert2ssl-'n ot thirci-^rade scores in 1968 
against second-grad,- scorfcs in l^bl . Tbe other symbols i»re in:>-rpreted 
siiLilarly. 



PROJECT talk':;t hicrkssions 



Test 


Equat Ion 


F- 
rat lo 


K" 


i ,i;id jrd 
! r ror 


nf 
Schools 


>!ean 
Y 


Siandfird 
Devi .It Ion 
Y 


Mean 
SES 


S t andard 
He vi .It ton 
SES 


^lb-grade 
Apt Itude 


Y - -76. 37 ^ 3. 56(SJ.S) 


307.8 


0,29 


66.4 


746 


453, 7 


79.0 


95.2 


7.7 


UtSi-gtadtf 
ti»-MK'ral 
Apt Itude 


Y - -215. 35 ^ 7.36(SKS) 


429.6 


y.3i 


72.1 


820 


A93. 1 


89.0 


96.2 


7,1 
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regressions including rural schools), which showed soine slight but 
perhaps inconsequential thickening of the right tails. {The most 
deviant of these is shown in Figure 3^) We found no immediate evidence 
for discontinuous educational technologies nor for the existence of a 
few extremely high-achieving schools. 

The results from looking at series of such distributions of resi- 
duals were more suggestive, although quite mixed. The Chi-square 
analysis results are provided in Table 2. They can be summarized as 
follows: 

1. The Michigan data provides some evidence of unusually effective 
schoo Is . 

a. Counting rural schools, the Chi-square tests showed 
more consistently overachieving schools than chance alone 
would allow. For example, among the 161 schools that reported 
scores for all eight grade-year- tes t combinations, 15 were at least 

one standard deviation above the mean six out of eight times (less 

18 

than one was expected by chance). Restating these results, 

about 9 percent of the schools seemed able to raise their students 

on average by an amount equal to an increase from the 50th to the 

19 

72nd percentile, given equal background factors. However, we 
found that most of these outstanding schools were rural and all 
white, even after controlling for community type and percent mino- 
rity, which evidences heteroscedasticity in the control variables. 
By running regressions stratified on community type, we found that 
our regressor variables could only explain 7 percent of the 
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Table 2 
RESULTS UF CHI-SQL'ARE TESTS 

0M].t: • "{oi^/: 



Schools Reporting 8 Times Schools KeporLing ^ limts 



No. >1 


Obs 




F.xpec tt-,i ! 


1 NV. 


■1 

— 1 


Obst-rvod } 

1 




0 


36 




39 


i 

1 3 




i4yj ; 


1>32 


1 


18 




19 1 


1 1 






349 


2 


]1 




19 


1 2 






303 


3 


8 




6 


3 




SJ 


34 


4 








4 




72 1 


13 


5 






I 










6 


!l 










! 




7 












1 




8 








! — 




I 






.6. 


Degrees ot* 


i 2 

X = 


387 . 2 , Degrees 


of 


Freedo\.T = 






1 Freedom = 4 





N'OTE : Tlie Chi-squaro stacistics are significant at 
the 0.005 level. 



NEW YOiO: CITY FXEMENTAKY SCHOOLS 



ERIC 



Grades 3-6, 1968 



Grades 3-6, 1970 



No. >1 


Observed 


Expected ! 


No. '1 


Observed 


r:-:pec ted 


0 


280 


2fal ! 


0 


266 


248 


1 


111 


142 j 


i 


113 


133 


2 


32 


29 


2 


28 


23 


i\ 


12 


3 




7 


2 


X = 33. 


6,"^ Degrees of 


- 10. 


4, Ptgreea of 


Freedom = 3 




Freedom = 3 




Grade 5. 1967-71 


Grade 3. 1967 


-71 


No. >1 


Observed 


Kxper ted 1 


^'o . ■■ i 

■ 






0 


334 


328 


0 ! ^66 


344 


1 


158 


179 


1 


157 


187 


2 


49 


37 \ 


2 


33 


38 


1} 


6 


4 




12 




7 












X = 7 » 8 , Degrees of 


= 21 




of 


Freedom =» 3 




Freedom - ^ 





NOTE: The pn jabiliLv of a srhool cxcrfciLng (:ne statularr' 
deviation above the mean was approxiraate ly 0.12 for each 
grade/year dis tr i butiun An asterisk (*) ii'li.aVes no 
nificance at the 0.05 le -"Sl. A daqger (t) Indir.ntes sie:M:'i- 
cance at the 0.005 level. A double d;ik-gtT ([) i:.Ji,:ates 
significance at the 0.023 level. 



^Rojkl: TAi.r:.\'T schools 

Grades 9 and 11, C.>neral Aptitude 



No. ^1 




Expec ted 


0 




544 


541 


1 


149 


156 


2 


15 


11 



- ^--^ f^egrees oi Freedom = 2 
\'^'^P ■ ri.^ whi-square statistic 
.'...r '^i .>^.n i f i^;ant at the 0.05 level. 
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variation among rural schools, compared to 50~60 percent for the 
other four community types corabined. This may imply something 
about tiie nature of rural schools, or it may be a result of imperfect 
measures for SES , 

b. Not including the rural schools, we also found evidence of 
consistent overachiever s . For example, among the 2131 schools that 
reported scores for four grade-year-test combinations, 72 were at 
least one standard deviation above the raean all four timej> (13 vjere 
expected by chance). In other words, about 2 1/2 percent of these 
schools seemed able to move their students an amount equivalent to 
an increase from the 50th to the 65th percentile, given equal back- 
ground factors. 

Furthermore, these 72 schools turned out to be significantly 

different from the average non-rural school on three out of four 

school-related factors. Table 3 shows that the top 72 schools 

tended to have smaller classes, more teachers with five or more 

years of experience, and more teachers earning $ll,OoO or more. 

Despite some significant differences in the number of children tested 

in the fourth grade, different sample sizes could not account for 

21 

the position of the top 72 schools. Neither could differences in 
non-school factors, although it was interesting to note that the 
overachieving schools were slightly lower than average in 3ES. The 
over achievers tended to be located more in northern Michigan than the 
average; once again, despite eliminating rural schools, this may be 
evidence for some regional/rural factor contributing to unusual 
effectiveness • 
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2, For the New York City data, the results were equivocal. We 
examined two years over four grades C1968 and 1970); and two grades 
over four years (third and fifth). Although in one year and for one 
grade we found some evidence of consistent overachievers, in the 
other year and grade it i.>eeraed that random variation could account for 
almost all the outliers observed. Furthermore, the consistent over- 
achievers that were identified averaged only 1.5 inter-school standard 
deviations above the mean, not as large as in the Michigan schools. 
Very few schools indeed were above one standard deviation every time, 

3, The Project Talent data showed no evidence of consistently 
overachieving schools apart from what chance alone would predict. This 
negative finding seems even stronger when one considers that only SES 
was used as a regressor. 

In addition to looking for unusually effective schools, we took a 

brief look at two other levels of aggregation. Are there unusually 

effective districts? Using regressions by the University of the State 

of New York on 1969-70 and 1970-71 New York district scores for reading 
22 

and mathematics, we found some very suggestive evidence for out-- 
standing districts. Among the 627 districts we studied, 30 were above 
one standard deviation at least five out of eight times, while less 
than 4 districts were expected by chance. Unfortunately, the Universi^.y 
of the State of New York regressions did not provide information that 
would allow uG to gauge how far these districts were able to raise 
their students' score in inter-student or percentile terms* 



-24- 

We also looked for unusually effective grades . Perhaps an entire 
school is not outstanding, but certain of its grades are. However, 
there was little evidence to encourage further investigation of this 
hypothesis. The New York City results have already been discussed; 
there ve looked at schools' third and fifth grades over time and found 
little evidence of consistent overachievers . No fifth grades seemed 
unusually effective; 2.1 percent of the third grades seemed consistently 
able to raise their students about half a grade level above what would 
have been expected given their second grade scores. 

We also analyzed the Michigan data including rural schools to see 
if grade effects seemed greater than the school effects on both grades 
4 and 7. Although there were more outstanding fourth and seventh 
grades than chance would predict^ the amount was consistent with the 
notion that it was school effects rather than grade effects that 
accounted for these outliers. 

Other levels of aggregation could of course be imagined; specifi- 
cally, it would be of great interest to look for unusually effective 
teachers. The district findings do seem suggestive, and perhaps the 
search for unusual educational success should look both above and 
below the school level, at districts and classrooms. 



V. DISCUSSION 

Jencks and others have shown how tright the distribuLion of school 

achievement scores is once one controls for non-school background 

factors that influence such scores. Our results; i^uppert that finding. 

We discovered no school that was consistently able to raise its 

students' achievenifint scores more than about eight-tenths of an inter- 

23 

student standard deviation. When we did identify a group of over- 
achieving schools, they comprised from 2 to 9 percent of the sample and 

averaged about four- to six-tenths of an inter-student standard 

24 

deviation above the mean per test/' These schools were statistically 
"unusual," but whether they were unusually ef f ec t ive depends on one's 
subjective scale of magnitude. It is also important to recall that we 
allowed "school effcctivess" to include all the variation in the resi- 
duals, not just that which could be strictly allotted to explicit 
school coefficients 5 so that our estimates of the sciiool impacts are 
upwardly biased. 

Nonetheless; moving away from average effects of schools does seem 
a v/orthwhile step. It appears that we liave located schools deserving 
o f fur th e r . rnor e d e tailed study. It is probably also wo r t h wh i ' c to 
be^'in looking for unusaally effective school districts and classrooms, 
.'jnci the \:>e chodology developed in this paper should prove us eful in such 
c t f or ts . As cduca t iona 1 researchers cent in nc to deve lo p new measures 
of school outcomes, and as they bogin focusing on types of students 
rather th.an school means, they should remember that most statistical 
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techniques concentrate on the average effects of all schools. For both 
policy and research purposes, however, exceptions to the rule may be 
more important. 
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arose with respect to rural schools in the Michigan data, as 
will be discussed below. 
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eight residuals were generated: 




Since the R-M residuals for a given year 
dent, we reworded the null hypothesis to 
scores are independent. 



and grade are not indepen- 
posit that the pairs of 
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Let be the number of scores in a school's reading-mathematics 

pair (R.,M.) that exceeds one standard deviation above the mean. 

has the possible values 0, 1, 2, Now compute a total score 

T_. for each school where T_. =^ + • • . • *^ Cj is the 

number of pairs of scores the school reported) . Assuming the X_^ 

are independent, one can compute null distributions for T. using 

3 

the actual probabilities of 0, 1, and 2 successes per pair. Then 
the actual distribution can be compared to the null distribution 
using a Chi-square test. 



The actual probabilities for the Michigan pairs were: 









N 


P(X=0) 


P(X=1) 


P(X=2) 


Fourth- grade 


69- 


■70 


1836 


0.808 


0.104 


0.088 


Seventh- grade 


69- 


■70 


480 


0.831 


0.092 


0.077 


Fourth- grade 


70- 


■71 


1891 


0.806 


0.112 


0.082 


Seventh-grade 


70- 


•71 


530 


0.832 


0.083 


0.085 



A similar procedure was used in the New York district data. 

One final note about the computation of the Chi-square statistic. 

In contingency tables with more than one degree of freedom, one must 

pool cells with small expectations in order that the Chi-square 

approximation be accurate. Throughout our investigations we followed 

a pooling rule proposed by Yarnold: 

If the number of classes s^ is three or more, and if r^ denotes 
the number of expectations less than five, then the minimum 
expectation may be as small as 5_r/s_. 

2 

(Yarnold, James K. , "The Minimum Expectation X Goodness of Fit Tests 
and the Accuracy of Approximations for the Null Distribution," Journal 
Q of the American Statistical Associaton , Vol. 65, No. 330, June 1970.) 
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19. The 15 schools comprise about 9 percent of tne 161 that reported 

test scores eight times. These schools averaged two standard 
deviations above the inter-school mean on each test;. The standard 
error of the regressions ranged between 2.38 and 3 . 94 , meaning 
that two standard deviations was around 5-6 test points. The tests 
are standardized to have a mean of 50 and an inter-student standard 
deviation of 10; 5-6 points is therefore between five^ and six-tenths 
ol an inter-student standard deviation, which implies a change on 
average from the 50th percentile to about the 72nd. 

20. The 72 schools averaged 1,65 inter-school standard deviations above 

the mean on each test, which is equivalent to about 3^5 test points, 
given standard errors between 1.72 and 2.68. That much of an 
average increase corresponds to raising an average child from the 
50th percentile to about the 65th. 

Are the changes documented in the last two references large? Two 
analogies may help. On most IQ tests, half an inter-student stan- 
dard deviat:ion is about 8 points; on the seventh grade Iowa reading 
test, it corresponds to almost a full grade level. 

21. The number of children tested affects the estimate of the mean 

school score, since the standard deviation of x = o//n. The 
variation in x will be larger for smaller schools, and therefore 
among the outliers one would expect a more than proportionate 
number with small numbers of students tested. However, the 
statistical significance of the difference between the top 72 
and average schools on number of children tests in fourth grade. 
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the difference between /53 and *''66 is not enough to accou-iC foe 
the magnitude of the out:liers' overachievement . 

22. See the University of the State of New York, New York St ate Perfo r" 

ma nce Indica tors in Education , 1972 Report (Albany, 1971'), pn. 17-19. 

23. The highest average over l;our tests was 2.92 inter ■ schoc 1 bL iiifl-^ 

deviations, corresponding to less than eight-tenths of an inter- 
student standard deviation • 

24. Since different regressor and response variables were used, the 

results are not strictly comparable. However, for the same 
reason they may set a more convincing upper limit on the number 
and magnitude of unusually effective schools. 



